Skip to content

Cherry-pick #1728 to r0.4.0 (Qwen refs removed)#2030

Closed
pthombre wants to merge 1 commit intor0.4.0from
cherry-pick-1728-r0.4.0
Closed

Cherry-pick #1728 to r0.4.0 (Qwen refs removed)#2030
pthombre wants to merge 1 commit intor0.4.0from
cherry-pick-1728-r0.4.0

Conversation

@pthombre
Copy link
Copy Markdown
Contributor

@pthombre pthombre commented Apr 23, 2026

Summary

Manual cherry-pick of #1728 (feat: Add diffusion finetuning CI pipeline for nightly runs) onto r0.4.0, with all QwenImage-specific additions removed. The underlying Qwen-Image support (#1704, #1976) is not on r0.4.0, so wiring CI for it would be half-implemented and non-functional.

What is excluded (vs. the original #1728)

  • examples/diffusion/finetune/qwen_image_t2i_flow.yaml — not created on r0.4.0
  • "QwenImagePipeline": "image" entry in examples/diffusion/generate/generate.py
  • - qwen_image_t2i_flow.yaml entry in tests/ci_tests/configs/diffusion_finetune/nightly_recipes.yml
  • qwen_image_t2i_flow*) case block in tests/ci_tests/scripts/diffusion_finetune_launcher.sh

What lands

CI infra for the models already supported on r0.4.0: Wan2.1, HunyuanVideo, Flux. New files:

  • tests/ci_tests/configs/diffusion_finetune/nightly_recipes.yml
  • tests/ci_tests/configs/diffusion_finetune/override_recipes.yml
  • tests/ci_tests/scripts/diffusion_finetune_launcher.sh

Plus CI sections appended to the existing Wan/Hunyuan/Flux recipe yamls, and updates to tests/ci_tests/utils/generate_ci_tests.py.

Test plan

  • CI passes on r0.4.0 with the cherry-pick
  • Nightly diffusion finetune pipeline runs the Wan/Hunyuan/Flux recipes successfully
  • Confirm no dangling QwenImage references remain in any file touched on this branch

🤖 Generated with Claude Code

* feat: Add diffusion pipelines for nightly runs

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Reduce ci runtime to 30 minutes

Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* debug: Check if HF_TOKEN is set

Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* test: revert test variables

Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* feat: add HunyuanVideo nightly CI test and parameterize diffusion launcher

Add HunyuanVideo-1.5 to the diffusion finetuning CI pipeline alongside
Wan2.1. Parameterize the launcher script to derive model-specific settings
(processor, generate config, model name, frame counts) from the recipe
config name. Also fix a pre-existing T5 layer norm compatibility issue
in finetune.py that affects Hunyuan training with incompatible apex builds.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* style: ruff format on modified files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* revert: remove patch_t5_layer_norm from finetune.py

The patch was a workaround for an ABI-incompatible apex build on a
specific compute node, not a code issue. CI Docker builds apex from
source so it is not needed there.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* feat: add Flux and QwenImage T2I nightly CI tests

Extend the diffusion nightly CI pipeline to support text-to-image models
(Flux and QwenImage) alongside the existing text-to-video models (Wan,
HunyuanVideo). Uses the diffusers/tuxemon dataset for image CI smoke tests.

Changes:
- Add MEDIA_TYPE branching in launcher for image vs video stages
- Add tuxemon dataset download/extraction with JSONL captions
- Add image preprocessing and .png inference verification paths
- Add ci: sections to flux_t2i_flow.yaml and qwen_image_t2i_flow.yaml
- Register QwenImagePipeline in generate.py output type mapping

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

---------

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com>
Co-authored-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 23, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@pthombre pthombre closed this Apr 23, 2026
@pthombre pthombre changed the title Cherry-pick #1728 to r0.4.0 Cherry-pick #1728 to r0.4.0 (Qwen refs removed) Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant